Skip to content

Node name fix #268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 17, 2025
Merged

Node name fix #268

merged 11 commits into from
Jan 17, 2025

Conversation

davissp14
Copy link
Contributor

@davissp14 davissp14 commented Jan 15, 2025

PG Flex migration has historically been challenging due to the use of private IPs as node names in the repmgr.conf file. This configuration required converting the node name to the Machine ID, which is subsequently translated into {machine_id}.vm.{app_name}.internal. However, the node_name field in repmgr.conf has a 60-character limit, preventing the direct use of the DNS representation as the node name.

Using Machine IDs addresses this limitation. Machine IDs remain consistent during migrations, allowing us to achieve the goal of seamless migration for PG members.

Note: When upgrading, the primary node in HA setups may temporarily be identified as a zombie because it will not have the updated code necessary to translate Machine IDs. However, this condition will resolve automatically once the primary node itself is upgraded.

Other notable fixes

  • The pg_unregistration process will now evaluate the state of the cluster after the node/replication slot are removed. This should help address issues with primary being moved into a zombie state after scaling down from 3 members to 1.

@davissp14
Copy link
Contributor Author

Related change: superfly/flyctl#4168

@davissp14 davissp14 marked this pull request as ready for review January 16, 2025 18:33
@@ -225,14 +231,6 @@ func (n *Node) Init(ctx context.Context) error {

// PostInit are operations that need to be executed against a running Postgres on boot.
func (n *Node) PostInit(ctx context.Context) error {
if ZombieLockExists() {
Copy link
Contributor Author

@davissp14 davissp14 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning, allow it to be re-screened as there are certain failure conditions that can be resolved automatically. If not, it will enter a crash loop until the machines max restarts are hit.

@davissp14 davissp14 merged commit 4d426fb into master Jan 17, 2025
6 checks passed
@davissp14 davissp14 deleted the node-name-fix branch January 17, 2025 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant